# Research Data "Generative AI predicts human performance"

This README file documents the research data for an analysis conducted on the effects of context on the relatedness and memorability of garden-path sentences. Below are details about the method, data sources, participants, material, and procedure used in this study.

## Ethics

The experiment received approval from the local ethics committee of the Leibniz-Institut für Wissensmedien (LEK 2023/051).

## Data Sources

### Machine/LLM Data

- **Source:** OpenAI's API, accessing ChatGPT (model: GPT-4)
	- "related" and "memorable" prompts (main analysis): June 2023
	- "linked" and "recognizable" prompts (robustness check analysis): March 2024
- **Data Collection:** 100 responses were collected for each sentence pair, concerning their relatedness and memorability values.
- **Settings:** The temperature value was set to 1 to increase the variability in the answers.
- **Total Responses:** 9000 responses were collected (for each main and robustness check analysis), with 4500 in the fitting condition and 4500 in the unfitting condition.

### Participants

- **Recruitment:** 100 English-only speaking participants were recruited via Prolific.
- **Demographics:** Out of the participants, 15 indicated non-normal or non-corrected-to-normal vision. The final sample consisted of 85 participants (57 female, 27 male, 1 without gender response), with a mean age of M = 45.34 years (SD = 13.95).

## Material

- **Sentences:** The study compiled a list of 45 garden-path sentences and constructed fitting and unfitting context sentences for each.
- **Exclusion:** For human data, one sentence ("The government plans to raise taxes were defeated.") was omitted for counter-balancing reasons.

## Procedure

### Machine Data

- **Prompts:** Zero-shot prompts were submitted to ChatGPT.
- **Variability:** To increase data variability, the temperature parameter was set to 1.0.

### Human Experiment

- **Programming:** The experiment was programmed with PsychoPy (Peirce, 2022).
- **Phase:** Participants completed a learning phase in which they rated the relatedness of the two sentences on a scale from 1 ("not at all") to 10 ("perfectly"), followed by a surprise old/new-recognition memory test.
- **Design:** The study employed a one-factorial within-subject design with context (fitting, unfitting) as the within-subject factor.

## Files and Folder Structure

- `01-Machine_data/linked_prompt/00-Skript_2024-02-22.R`: R script containing the data analysis of the machine data (robustness check analysis).
- `01-Machine_data/linked_prompt/output_linked.csv`: Answers from ChatGPT.
- `01-Machine_data/linked_prompt/input.csv`: Prompts submitted to ChatGPT.
- `01-Machine_data/linked_prompt/input_structure.csv`: Structure data describing the experimental design.
- `01-Machine_data/memorable-related_prompt/00-Skript_2024-02-22.R`: R script containing the data analysis of the machine data (main analysis).
- `01-Machine_data/memorable-related_prompt/00dat.csv`: Raw data containing the prompts and responses submitted to ChatGPT (via the API).
- `01-Machine_data/recognizable_prompt/00-Skript_2024-02-22.R`: R script containing the data analysis of the machine data (robustness check analysis).
- `01-Machine_data/recognizable_prompt/output_recognizability.csv`: Answers from ChatGPT.
- `01-Machine_data/recognizable_prompt/input.csv`: Prompts submitted to ChatGPT
- `01-Machine_data/recognizable_prompt/dat_recognizability.csv`: Structure data describing the experimental design.
- `02-Human_data/PsychoPy-Experiments.zip`: ZIP folder containing the PsychPy versions of the human experiments (4 counter-balancing versions)
- `02-Human_data/Analysis/00-Skript_2024-02-22.R`: R script containing the data analysis of the human data.
- `02-Human_data/Analysis/data/`: raw data (CSV files)
- `03-Figures/`: Figures for the manuscript.

## License 

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.


